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ABSTRACT: Biological sciences pose a unique set of engineering challenges due to incomplete 
understanding of natural biological systems. Currently, sequencing of macromolecules i.e., DNA 
(deoxyribose nucleic acid) and proteins obtained from living cells has provided significant 
information, which is available in different database repositories. These databases comprising of 
genomic sequences and amino acid sequences (proteins) are utilized in genetic engineering of 
biological systems to increase the production of chemicals and pharmaceuticals for improving plant 
and animal health. Recently, synthetic biology approaches are being employed in rational and high- 
throughput biological engineering to enhance the production of beneficial chemicals. Recent molecular 
and bioinformatics tools have enabled to redesign the entire biological cycle, including construction of 
synthetic DNA inside the cell or replacement of entire genome to create synthetic organisms by 
utilizing gene libraries, computational tools and interfaces. This review describes the genomic, 
proteomic and phylogenetic databases, which may be utilized for designing and manipulation of 
synthetic gene circuits to perform novel functions and desired phenotypes in different ecosystems. In 
addition, synthetic biology approaches were discussed for designing biological systems for production 
and release of specific metabolic products. The progress and challenges faced in computational 
methodology and synthetic biology approaches are discussed for their potential applications in 
synthetic biology. 
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INTRODUCTION 


Digitalized biological information and networks of molecules and their reactivity, and 


databases are doubling after every 12 to 18 
months with current advances in genomics and 
proteomics [1]. Recently, there has been 
enormous progress in unraveling the 
complexities of naturally-occurring biological 
systems, which has provided abundant scientific 


information in the field of nucleic acid 
sequences and _ protein databases for 
agriculture, biomedical research, synthetic 


biology (biological and chemical engineering), 
and metabolic engineering including production 
of pharmaceuticals and nutraceuticals (Fig. 1) 
[2-5]. Such interdisciplinary approaches utilize 
computational and _ bioinformatics tools to 
manipulate microorganisms for getting a deeper 
understanding of the complex biological 
systems and have linked the multiple networks 
of fundamental biological discoveries [6,7]. At 
the same time, synthetic biology has evolved to 
generate predictable phenotypes of fairly large 


mechanism is governed by several genes and 
microbial communities [8-10]. The future of 
systems biology and synthetic biology will 
involve engineering of the entire genomes to 
create synthetic organisms and ecosystems that 
are capable of performing novel functions and 
desired phenotypes in different environments 
[11-13]. Such advances’ will involve 
development of new experimental methods, 
computational approaches and theoretical as 
well as conceptual frameworks involving multi- 
scale modeling and data integration [14-18]. 
Current scientific progress in the biological 
systems, systematic understanding of complex 
metabolic regulatory mechanisms and _ their 
bioengineering holds enormous potential for 
improving crop production, human health and 
eco-friendly environment [19-22]. 

Biological databases incorporate data from the 
disciplines of genomics, microarray gene 
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expression, proteomics, phylogenetics and 
metabolomics in addition to details on the 
configuration, localization, and function of 
genes together with commonalities between 
biological sequences. In biochemical 
engineering, major components of metabolism 
could be totally redesigned for more efficient 
use of asset pools or resources to minimize 
material drains for a sustainable future [6, 23]. 
Microbial engineering often utilizes natural 
databases and computational tools at all 
degrees of biological organization and functions 
inside the cell [13,24,25]. Utilizing genome-scale 
models and _ optimization of algorithms, 
metabolic network analysis and designing of 
biological circuits may be accomplished 


Computational 
Analysis 


E-ISSN: 2349 5359; P-ISSN: 2454-9967 


[4,26,27]. Additionally, synthetic DNA construct 
could be transferred in microbial strain or living 
cell, and these structured DNA sequences could 
give desired levels of transcription and 
translation to accomplish enhanced protein 
production [28-30]. Emerging paradigms for 
computing in living cells may contribute in 
development of predictive computational 
models that could be validated by 
experimentation and applicable across many 
living host species [8,31]. Biochemical and 
computer engineers may offer technical 
solutions for biosynthesis of recombinant 
proteins for novel sustainable processes as per 
ecological and economical needs [32-35]. 


Microorganisms 


Animals, Humans 
NCBI 
GenBank EMBL 
DDBJ 
Protein Data Bank (PDB) 


Modelling 


Metabolic 


Fig.1. Application of DNA and protein databases for production of novel products and improving 
efficacy of biological systems 


2. NUCLEIC ACID SEQUENCING AND DNA 
DATABASES 

The genomic revolution in the last two 
decades has provided the ability to sequence a 
cell’s genetic material i.e., deoxyribose nucleic 
acid (DNA), enabling the effective engineering 
of biological systems [36]. Nucleic acid 
constitutes the genetic material of the living 
organisms and is responsible for transfer of the 
hereditary information from the parents to the 
off springs. Nucleic acid exists either as DNA or 
ribose nucleic acid (RNA; in some viruses). DNA 
is a polymer of nucleotides, consisting of 
adenine (A), guanine (G), cytosine (C) and 
thymine (T) bases. The ability to store billions 
of nucleotide base sequences is an important 
feature of the DNA. The hereditary information 
present in the nucleotide sequences is 
maintained intact by complex metabolism 


involving both DNA replication and repair 
functions. The different nucleotides i.e., ATP, 
GTP, CTP and TTP are polymerized by the DNA 
polymerase enzymes using one of the DNA 
strand as a template. 

The technique used for determination of 
precise order of nucleotides in a piece of DNA is 
termed as DNA sequencing. Different methods 
of sequencing are employed to _ study 
arrangement of nucleotides on the genomic 
DNA. For instance, the chemical cleavage 
methods developed by Maxam and Gilbert and 
dideoxy chain termination method developed 
by Sanger are employed for rapid sequencing of 
long stretches of DNA [37]. Recently, automated 
DNA sequencing machines are capable of 
identifying 10,000 nucleotide base pairs per day 
and have become commercially available. In the 
automated systems, detection and analysis of 
sequencing reactions is carried out by 


Divya Sindhu & Saurabh Sindhu 


International Journal of Advanced Science and En 


ineering 


www.mahendrapublications.com 


Int. J. Adv. Sci. Eng. Vol.9 No.4 3085-3098 (2023) 3087 


instruments controlled by computers. In 
automated SMRT sequencing method, the 
nucleotides are labelled differentially with 
flourescent dyes that will be resolved by photo 
multiplicator and the information is stored in 
the computer. Second generation sequencing 
technologies included a combination of a 
synchronized reagent wash of nucleoside 
triphosphates (NTPs) with a _ synchronized 
optical detection method, and also involved the 
use of 454 FLX, Solexa, scanning tunneling 
electron microscope (TEM), fluorescence 
resonance energy transfer (FRET), single 
molecule detection and protein nanopores. 
Sequencing speed and throughput was further 
increased in “third generation sequencing 
technologies”, which included PacBio SMRT and 
nanopore sequencing techniques [38]. Thus, 
next generation sequencing has provided us 
powerful insights into genetic make-up of the 
microbial world. Recently, non-canonical 
nucleobase pairs were developed, which were 
found to augment the nucleotides present in 
DNA and RNA [39]. The molecular features 
needed for informational molecules in biology 
provided an_ intellectual framework for 
technologies to identify alternate genetic 
systems for life elsewhere in the Universe. 


2.1. Nucleic Acid Databases 

The Nucleic Acid Database (NDB) was curated 
by Research Col-laboratory for Structural 
Bioinformatics (RCSB). It gives user access to 
software tools and distributes data for 
extracting information from nucleic acid 
structures. The database contains tables of 
primary and derivative information. The 
primary information includes atomic 
coordinates, bibliographic references, crystal 
data, data collection and other structural 
descriptions. The derivative information is 
calculated from the primary information and 
includes chemical bond lengths and angles, 
virtual bond length and other measures 
according to various algorithms [40,41]. The 
experimental data in the NDB have been 
collected from published literature, as well as 
from standard crystallographic archive file 
types [40] and other sources. Several programs 
have been developed to convert among various 
file formats [42,43]. Shukla et al. [44] revealed 
the exhibition of sequence specific structural 
properties of DNA helix, which could be 
exploited by DNA-binding proteins to control 
transcription. Relevant databases are presented 
and categorized as aids in understanding the 
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resources that are available to bioinformatics 
researchers. 


2.2. Nucleic acid-base repositories 

Specific segments of DNA, which codes for 
particular protein/polypeptide are termed as 
genes. Individual person’s genome has about 
three billion nucleotide bases, which possess 
the capability to encode about 100,000 genes. 
Interestingly, these coding regions make up 
only about 10% of our genome. Moreover, some 
genes are arranged as clusters known as 
operons and multigene families. The nucleic 
acid sequences from different viruses, bacteria, 
fungi and plants have been deposited in the 
National Center for Biotechnology Information’s 
(NCBI’s) GenBank (USA) [45]. Globally, most 
widely used large biological databank resource 
on the World Wide Web databases include the 
NCBI’s GenBank, as well as its partners EMBL- 
Bank (Europe) [2] and the DNA Data Bank of 
Japan (DDBJ) [46] (Table 1). Other related 
databases deals at species-oriented databases 
including TAIR [47] and non-coding RNA 
sequence databases such as Rfam [48]. The 
database processing could be done by using the 
computer programmes and its comparison with 
the genome of other organisms is carried out by 
application of bioinformatics tools. Thus, the 
sequence homology could help in determining 
the function of particular protein or enzyme 
encoded by a particular gene. 

The biological information encoded contained 
in various genes is made available by gene 
expression. The information is transferred from 
DNA to mRNA by the process of transcription 
and this information is further translated into 
the amino acids by making use of the ribosomes. 
Different amino acids join together through 
peptide linkages to make various proteins. The 
sequences of amino acids in various proteins 
are unique. Arrangement of amino acid 
sequences in specific proteins determine their 
primary, secondary and tertiary structures and 
confers specific functions in living cell [49,50]. 
The sequences of different proteins are 
deposited in the protein repositories such as 
UniProt [51,52] as well as its contributing data 
repositories viz. Swiss-Prot [53] and the Protein 
Information Resource [54-56]. Some of the 
nucleotide data bases along with their 
respective URLs have been listed in Table 1. 


2.3. GenBank, EMBL Bank and DDBJ 
GenBank is the genetic sequence database of 
National Institutes of Health (NIH) and it is an 
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annotated collection of all publicly available 
DNA sequences. Three databases were 
developed separately and the GenBank and 
EMBL-Bank were launched in 1980 [2, 45]. 
After the development of DDBJ [46], their 
collaboration started. These three databases 
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operate under the direction of the International 
Nucleotide Sequence Database Collaboration 
(INSDC) for collecting, maintaining and sharing 
of nucleotide data. Each database bank caters to 
the needs of the region in which it is located 
[57]. 


Table 1 Various databases and their respective URLs 


_Dz 2 Ss = SSS Sie 
GenBank http://www.ncbi.nlm.nih.gov/ | dbEST http: //www.ncbi.nlm.nih.gov/dbEST/ 
EMBL Bank _| http://www.ebi.ac.uk/embl/ Rfam http: //rfam.sanger.ac.uk/ 

DDBJ http: //www.ddbj-nig.ac.jp RNA STRAND http: //www.rnasoft.ca/strand/ 
Ensembl http://www.ensembl.org/ fRNAdb http: //www.ncrna.org/frnadb/ 
TAIR http://www.arabidopsis.org/ PIR http://pir.georgetown.edu/ 
GeneDB http://www.genedb.org SGD http://www. yeastgenome.org/ 
BV-BRC https://www.bv-brc.org/ BIC http://bic.jhlab.tw/ 
CottonMD http://yanglab.hzau.edu.cn/Cot | microbioTA http: //bio-annotation.cn/microbiota 
tonMD/ 
BRAD http://brassicadb.cn CyanoOmicsDB _| http://www.cyanoomics.cn 
TFSyntax https://tfsyntax.zhaopage.com | Ensembl https://covid-19.ensembl.org 
COVID-19 
resource 


2.3.1. Ensembl Genome Database 

The Ensembl database is available as an 
interactive Website or downloadable as flat 
files. It is a repository of stable, automatically 
annotated sequences resulting from the Human 
Genome Project [58]. Ensembl annotates and 
predicts new genes, with annotation from the 
InterPro [59] protein family databases and 
additional annotations from databases of 
genetic disease (OMIM) [60], serial analysis of 
gene expression (SAGE) [61] and gene family 
[62]. Software for Ennsembl is freely available 
and it is based on relational database models 
[63]. 


2.3.2. Arabidopsis Information Resource 
The Arabidopsis Information Resource (TAIR) 
allows for information retrieval and data 
analysis pertaining to Arabidopsis thaliana 
genome. A. thaliana is a small annual plant 
belonging to the mustard family and serves as a 
model for plant genome investigations. The 
genome of A. thaliana is completely sequenced 
and the database has been designed in a very 
simple, portable and efficient manner for its 
efficient utilization by the biologists and 
biotechnologists [47]. Map Viewer is an 
innovative aspect of the TAIR Website and it is 
an integrated visualization tool for viewing 
genetic, physical and sequence maps for each 
Arabidopsis chromosome. Each component of 
the map contains a hyperlink to an output page 


from the database, which displays all the 
information related to this component [47]. 


2.3.3. Saccharomyces Genome Database 

Saccharomyces cerevisiae is a baker’s and 
brewer’s yeast, and its genome has been 
completely sequenced. The Saccharomyces 
Genome Database (SGD) provides information 
for its genes, gene-encoded proteins, the 
structures and biological functions of known 
gene products and related literature [64]. The 
SGD database is not a primary sequence 
repository, but it is a collection of DNA and 
protein sequences from existing databases 
GenBank [45], EMBL-Bank [2], DDBJ [46], 
protein information resource (PIR) [51] and 
Swiss-Prot [53]. The sequences have been 
organized into datasets to make the data more 
useful and easily accessible. 


2.3.4, GeneDB 

GeneDB is a genome database for prokaryotic 
and eukaryotic organisms [65]. It contains 
genomic data generated from the Pathogen 
Sequencing Unit (PSU) at the Wellcome Trust 
Sanger Institute. The GeneDB database stores 
and frequently updates sequences and 
annotations. GeneDB also provides a_ user 
interface for easy access, visualization, 
searching and downloading of the data. In 
addition, the database architecture allows 
integration of different biological datasets with 


Divya Sindhu & Saurabh Sindhu 


International Journal of Advanced Science and En, 


ineering 


www.mahendrapublications.com 


Int. J. Adv. Sci. Eng. Vol.9 No.4 3085-3098 (2023) 3089 


the sequences. GeneDB also facilitates the 
comparisons of species by using structured 
vocabularies. 


2.3.5. ADEST 

dbEST database contains sequence data and 
other information on short, “single-pass” cDNA 
sequences, or expressed sequence tags (ESTs), 
generated from randomly selected library 
clones [66]. dbEST can be accessed using the 
Web, from NCBI by annomynous FTP or through 
entries [67]. BLAST sequence search program at 
the NCBI Website is used to search dbEST 
nucleotide sequences. dbEST DNA sequences 
can also be useful for finding novel coding 
sequences. On the other hand, EST sequences 
are available in the FASTA format from the 
/repository/dbEST directory at ftp.ncbi-nih.gov. 


3. PROTEIN DATA BANK AND PROTEIN 
REPOSITORIES 

The biological information encoded in various 
genes is made available by gene expression 
[68]. The information is transferred from DNA 
to mRNA by the process of transcription and 
this information is further translated into the 
protein by making use of the ribosomes. The 
arrangement of various amino acids in a protein 
determines the primary structure of protein. 
The amino acid sequences of various proteins 
are deposited in Protein Data Bank (PDB), 
which is a well-curated data resource, used 
widely in structural biology and biomedical 
sciences [69]. PDB was established as a 
repository for the three dimensional structures 
of biological macromolecules [3]. The Research 
Col-laboratory for Structural Bioinformatics 
(RCSB) maintains the PDB and it allows the user 
to view the data in plain text. Around 750 
gigabytes of data are transferred each month 
from the website and the RCSB PDB website is 
accessed by about 250,000 unique visitors per 
month from 140 countries. 

There are three major classes of nucleic acid 
containing entries in the PDB archive: RNA, 
DNA, and protein-nucleic acid complexes. More 
than 7,000 PDB entries contain carbohydrate 
polymers and/or individual  saccharides. 
Besides the PDB, there are a large number of 
repositories and databases used in structural 
biology, chemistry, life sciences and big 
pharmaceutical companies, where they are 
crucial in the drug discovery process. PDB 
allows a wide spectrum of queries through data 
integration to provide complete information 
about the features of macromolecular 
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structures. The PDB collects and integrates 
external data from scientist’s deposition, Gene 
Ontology (GO) [70], Enzyme Commission, KEGG 
Pathway Database [71], and NCBI resources 
[43]. Data integration through data loaders is 
written in Java, which extracts information from 
existing databases based on common 
identification numbers. The PDB also allows 
data extraction at query run time. 

Structural models serve as primary reference 
data and this reference data resources act as 
repositories augmented with database 
functionality (Table 2). The data repository 
used in structural biology is the earlier version 
of the PDB [72]. There are five access sites in 
the PDB repository i.e., the wwPDB site, three 
data centers (PDBe, PDBj and RCSB PDB), and 
an NMR specific component, the Biological 
Magnetic Resonance Data Bank (BMRB) [73]. 
While the wwPDB site allows data validation 
and deposition as well as archive download, the 
remaining sites have more database capabilities 
that allow for data dissemination. All three PDB 
data centers utilize the common mmCIF format 
[74] to store the same underlying structural 
data, however, the design, information content, 
and analysis tools of each site are different. In 
addition to structural information, other 
information about proteins can be accessed 
from the Universal Protein Resource (UniProt) 
[75]. On the other hand, information about 
protein location, function and interactions can 
be accessed from Gene Ontology (GO) [70] or 
Kyoto Encyclopedia of Genes and Genomes 
(KEGG) [71,76]. Different tools and resources 
are applied to PDBsum service to obtain an 
overview of a protein structure [77]. It includes 
analysis of structural attributes such as protein 
surfaces, cavities and ligands as well as 
interaction attributes such as the _ protein- 
protein, protein-DNA/RNA and _ protein-small 
molecule interactions. 


4. PHYLOGENETIC DATABASES 

The understanding of genomic and proteomic 
databases, and to analyze their relatedness is 
quite crucial to understand biological evolution. 
Since all biological organisms have developed 
through the _ evolutionary process, their 
patterns, functions, and processes are best 
analyzed in terms of their phylogenetic 
histories. The expression of same gene at 
different timing, in a different tissue or its 
expression resulting into a whole new function 
along one phylogenetic branch is usually 
compared with another. These changes along a 
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branch affect the biology of all descendant 
species, thereby leaving phylogenetic patterns 
everywhere. A detailed mapping between 
biological data and phylogenetic histories is 
accomplished to realize the full potential of the 
data accumulation activities [78,79]. 
Phylogenetic patterns provide information 
about the differences observed in affectivity of 
certain drugs in some species but not in others; 
and for designing therapies against evolving 
disease agents such as HIV and influenza. 
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The need to query data using sets of 
evolutionarily related taxa has spawned the 
need to create databases than can serve as 
repositories of phylogenetic trees. Phylogeny 
and phylogenetic trees give a picture of the 
evolutionary history among species, individuals 
or genes. Therefore, there are at least two 
distinct goals of a phylogenetic database: 
archival storage and analysis [80]. Major 
phylogenetic databases and their respective 
URLs are provided in Table 2. 


Table 2 Proteomic and Phylogenetic databases and their URLs 


Proteomic databases 


1 UniPort 


http: //www.ebi.ac.uk/uniprot/ 


2 Swiss-Prot/TrEMBL 


http: //www.expasy.org/sprot/ 


3 TmAlphaFold 


https://tmalphafold.ttk.hu/ 
AlphaFold TM protein predictions and assessed 


4 COMBATdb 


https://db.combat.ox.ac.uk 
COVID-19 Multi-omics Blood Atlas 


5 ProtCAD 


http: //dunbrack2.fccc.edu/protcad 
Protein Common Assembly Database 


6 ChromLoops 


https://3dgenomics.hzau.edu.cn/chromloops 
Protein-mediated chromatin loops 


7 PAT http: //bioinfo.qd.sdu.edu.cn/PAT / 
Prokaryotic Antimicrobial Toxin database 

Phylogenetic databases 
1 TreeBASE http: //www.treebase.org/ 
2 TreeFam http: //www.treefam.org/ 
3 Tree of Life http: //tolweb.org/tree/ 
4 NCBI Taxonomy http: //www.ncbi.nlm.nih.gov/taxonomy 
5 SYSTERS http: //systers.molgen.mpg.de/ 
6 PANDIT http: //www.ebi.ac.uk/research/goldman/software/pandit/ 


Due to rapid developments in genomics and 
proteomics involving novel sequencing 
technologies, large amounts of _ biological 
information is now available in biological 
databases. Sophisticated computational and 
bioinformatical analyses using data mining 
(DM) approaches and phylogenetic profile 
methods help in understanding the network of 
biological linkages and predictions of functional 
interactions between proteins across multiple 
genomes [49, 81, 82]. The varied applications of 
phylogenetic tools, algorithms and use of in 
silico methods to study phylogeny of microbes 
may be considered as a highly reliable and 
important technique in biological sciences [33, 
83, 84] and may replace wet experiments for 
prediction of evolutionary _ relationships 
between two microbial species in a laboratory. 


5. ENGINEERING OF BIOLOGICAL 
DATABASES AND SYNTHETIC BIOLOGY 
Living cells of the organisms contain DNA and 
RNA, proteins, enzymes, carbohydrates, lipids, 
vitamins and minerals. Various cells operate as 
highly complex biological computational 
systems, which sense the _ surroundings, 
interrogate the signals and respond to their 
environment [85]. Synthetic biology involves 
the construction and manipulation of the 
biological systems from the minute molecule 
(individual functional unit) to the functional 
cellular level [86,87]. The genomic revolution in 
the last two decades has provided the ability to 
sequence a cell’s genetic DNA, enabling the 
effective engineering of biological systems. 
Current advances in DNA synthesis and DNA 
assembly techniques have made it possible to 
engineer virus and bacterial genomes as well as 
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various metabolic pathways to modify, regulate 
and control cellular behaviour in a desired 
manner [88, 89]. In addition, a finer control 
over the expression of particular gene(s) in a 
given metabolic pathway could be designed and 
implemented either at the transcription or 
translation levels for its enhanced production, 
termed as metabolic engineering [14, 27,30]. 
Recently, synthetic promoters and synthetic 
enhancers have been designed to produce a 
desired level of transcriptional strength with 
the recent molecular and experimental tools 
[90]. Moreover, application of specific 
experimental and computational tools may 
regulate the expression of specific gene or 
metabolic pathway at both the transcriptional 
and translational levels for metabolic 
engineering applications [28,91]. Synthetic 
biologists leverage engineering design 
principles involving manipulations at the 
genomic level to use the predictability of en- 
gineering to control complex biological systems. 
Moreover, synthetic biology (synbio) 
endeavours to develop artificial cell-free 
biological systems through the combination of 
molecular biology and engineering approaches 
[86]. 

Currently, genetic manipulation of various 
microorganisms i.e. fungi, bacteria, and yeast 
has been made using various biotechnology and 
bioinformatics tools/techniques for biological 
and chemical engineering [4, 9, 92-94). New 
computational approaches have been developed 
for the construction of synthetic pathways using 
directed evolution of enzymes, for structuring 
and developing artificial enzymes (for non- 
natural reactions) and for re-wiring of host 
metabolism to alter the metabolic flux for 
synthesis of non-natural chemical products 
[6,35]. Qiu et al. [95] proposed novel emerging 
procedures to improve survival and activities of 
microbial inoculants, improvement in microbial 
delivery strategies and use of gene editing tools 
to design and engineer microbial inoculants. In 
a similar way, multiple biochemical and 
molecular methodologies were optimized for 
use in microbiome engineering to enhance 
beneficial plant-microbiome interactions for 
improving crop yields [96,97]. In addition, 
biological engineering for nature-based climate 
solutions (NbCS) will help in creating a 
productive, resilient, and proactive “climate- 
smart agriculture” to mitigate the risks of 
climate change [98]. 

In addition to improvement in agriculture 
production, the engineering of the beneficial 


E-ISSN: 2349 5359; P-ISSN: 2454-9967 


microorganisms may enhance the production of 
therapeutic and pharmaceuticals small 
molecules [34], which depends on optimization 
of biochemical pathway and computational 
tools developed by metabolic engineers for 
performing a particular function. This approach 
was also applied in Eschericia coli for ethanol 
production by using a quorum sensing module 
for density dependent repression (via a toggle 
switch) of phosphor-transacetylase (pta), which 
caused inactivation of a competing acetate- 
production pathway [99]. However, synthetic 
circuit decreased the yield and differed in 
behaviour from _ predictive models. The 
development of advanced technologies i.e., 
multiplex automated genome engineering 
(MAGE) relied on incorporation of multiple 
single-strand oligonucleotides introduced via 
electroporation into daughter cell genomes 
[100,101]. The application of this technology 
resulted in 4-5-fold increase in the production 
of lycopene as well as aromatic amino acid 
derivatives [102]. Similarly, Kang et al. [4] used 
genetic circuit-guided population acclimation of 
a synthetic microbial consortium including 
Vibrio sp. and Escherichia coli strains and 
achieved 4.3-fold increase in 3- 
hydroxypropionic acid (3-HP) production 
during a 48 hours fermentation process. 

For improving human health and _ the 
therapeutic potential, engineering of gut 
bacteria belonging to Bacteroidetes and 
Firmicutes was undertaken using different 
bioinformatics tools [103]. Consequently, 
synthetic biologists developed a_ toolkit 
amenable for engineering of the commensal 
Bacteroides thetaiotamicron and it comprises 
characterized promoters, ribosome binding site, 
inducible systems and CRISPRi platform [104]. 
Similarly, metabolic engineering may be utilized 
to engineer novel diagnostic and therapeutic 
strategies for control of cancer and infectious 
diseases. For instance, Escherichia coli was 
engineered to invade mammalian cells 
selectively in hypoxic environments [105]. In 
another study, Escherichia coli was engineered 
to sense the occurrence of bacterium 
Pseudomonas aeruginosa, which has been found 
to cause infections in the lung, urinary tract, 
gastrointestinal tract and_ skin [106]. 
Furthermore, synthetic biology plays a crucial 
role in designing and creating genetically 
engineered living materials (ELMs), comprising 
cells, microbes, biofilms, and spores, for 
biotherapeutic applications and represents a 
new platform for diagnostic and targeted 
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delivery to treat intractable diseases [107]. 
Similarly, many enzymes are involved in 
mediating cellular metabolism that represents 
drug targets. The anthranilate phosphoribosyl 
transferase enzyme (involved in catalyzing 
tryptophan biosynthesis from chorismate) is 
considered essential for the growth of 
Mycobacterium tuberculosis and could act as a 
drug target [108]. Such inhibitors may be useful 
in treating mycobacterial infections, and will 
address the problem of multi-drug resistance in 
M. tuberculosis. 

Application of biological databases using 
synthetic biology aims to design biological 
systems to a specification that react in a specific 
manner to an_- external stimulus to 
produce/create specific products. For instance, 
the combination of high-throughput phenotypic 
data with precision DNA editing by using 
CRISPR-based tools provided a_ unique 
opportunity to link changes in the underlying 
code to phenotype [109]. Moreover, current use 
of novel synthetic biology tools such as machine 
learning-based metabolic modeling, Clustered 
Regularly Interspaced Short  Palindromic 
Repeats (CRISPR) derived synthetic biology 
tools, and synthetic genetic circuits have 
accelerated our systematic understanding of 
complex metabolic regulatory mechanisms [22]. 
These tools are widely used to control the 
metabolism of microorganisms, manipulate 
gene expression, and build synthetic pathways 
for bioproduction in industrial bioprocesses 

Using novel synthetic biology tools, semi- 
synthetic organisms may be created in which 
increased genetic information may be stored 
and retrieved. For example, new codon/anti- 
codon pairs may be created to express proteins 
in Escherichia coli containing amino acids that 
are not found in nature [13]. Thus, successful 
production of ‘semi-synthetic organisms’ has 
profound implications for alternative metabolic 
pathways. In addition, novel technologies may 
help in production of new types of protein- 
based therapeutics and enzymes for use in the 
sustainable industrial synthesis of bulk 
chemicals. Recently, DNA Oligo Libraries (OLs)- 
based synthetic enhancers were designed and 
constructed to regulate gene expression by 
using latest commercial OLs synthesis 
technology [90]. Authors specifically focussed 
on synthetic-enhancer-based massively parallel 
reporter assay, Sort-seq methodologies (e.g. 
flow cytometry, deep sequencing), and machine 
learning-based attempts for OL-analysis 
followed up by validation experiments. 
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Recently, Benner [39] reviewed the 
development of non-canonical nucleobase pairs 
that can augment the nucleotides present in 
DNA and RNA. In subsequent studies, these non- 
canonical nucleobase pairs were used to build 
DNA [44]. Interestingly, synthesized DNA 
exhibited sequence specific structural 
properties of DNA helix, which could be 
exploited by DNA-binding proteins to control 
transcription. 


6. CONCLUSIONS 

Significant information about the DNA 
nucleotide sequences and amino acid sequences 
of proteins, obtained from different laboratories 
worldwide, is currently available in different 
database repositories [1,3]. These biological 
databases are utilized in genetic engineering of 
biological systems to increase the production of 
desired products or chemicals for benefit of 
plants and animals [4,6]. In addition to 
utilization of these biological databases, 
genome-scale models, optimization of 
algorithms, metabolic network analysis and 
bioinformatics tools are used in synthetic 
biology and metabolic engineering to 
completely redesign the genetic circuits within 
a living cell for more efficient utilization of 
resource pools with minimal material drains 
[26,110]. Molecular tools and strategies have 
been developed for using in silico and 
mathematical modeling of the biological 
systems to analyze endogenous biological 
circuits, with a particular focus on signaling and 
metabolic pathways for application in systems 
biology [18,111,112]. In addition, computer 
simulation allows researchers to build a 
common framework for designing novel 
biological networks for metabolic engineering 
[4,8]. The advantage of the cell-free systems 
under in vitro reactions has further contributed 
with the analysis, investigation, estimation and 
elucidation of dynamic interactions between 
genes and proteins in naturally-occurring 
systems [29]. Moreover, novel synthetic genetic 
constructs could be transferred into a microbial 
strain/living cell, and these designed DNA 
sequences could provide desired levels of 
transcription and translation to achieve 
enhanced protein production [27,28,91]. 

Thus, synthetic biology approaches include 
designing of biological systems by manipulation 
at the genomic level that reacts in a specific 
manner to the external stimulus to produce 
specific products. In addition, computational 
and bioinformatics tools may improve the 
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predictability of respective mechanisms and 
techniques for increasing the production and 
release of specific metabolic product. Current 
novel technologies involving programmable 
synthetic protocells, current advances at the 
interface of hardware and wetware such as 
solid-phase DNA _ assembly platforms, or 
delivery systems in basic bioscience research 
may act as emerging hubs for automation of 
biological engineering. Recent powerful 
approaches involving directed evolution of 
valuable new enzymes, designing of synthetic 
genomes using computer-aided design (CAD) 
technologies, automation of — specialized 
methods for chromosome transfer between 
microbes, plants, and mammalian cells (such as 
cell fusion, genome transplantation, or 
microinjection) and artificial cell research will 
further help in solving the upcoming challenges 
of synthetic biology [113-115]. This highly 
interdisciplinary and international approach of 
biochemical engineering involving government 
and private sectors will help in achieving the 
desired impact in biomedical, pharmaceutical, 
agricultural and chemical industries. 
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